2 research outputs found
“And all the pieces matter...” Hybrid Testing Methods for Android App's Privacy Analysis
Smartphones have become inherent to the every day life of billions of people worldwide, and they
are used to perform activities such as gaming, interacting with our peers or working. While extremely
useful, smartphone apps also have drawbacks, as they can affect the security and privacy of users.
Android devices hold a lot of personal data from users, including their social circles (e.g., contacts),
usage patterns (e.g., app usage and visited websites) and their physical location. Like in most software
products, Android apps often include third-party code (Software Development Kits or SDKs) to
include functionality in the app without the need to develop it in-house. Android apps and third-party
components embedded in them are often interested in accessing such data, as the online ecosystem
is dominated by data-driven business models and revenue streams like advertising.
The research community has developed many methods and techniques for analyzing the privacy
and security risks of mobile apps, mostly relying on two techniques: static code analysis and dynamic
runtime analysis. Static analysis analyzes the code and other resources of an app to detect potential
app behaviors. While this makes static analysis easier to scale, it has other drawbacks such as
missing app behaviors when developers obfuscate the app’s code to avoid scrutiny. Furthermore,
since static analysis only shows potential app behavior, this needs to be confirmed as it can also
report false positives due to dead or legacy code. Dynamic analysis analyzes the apps at runtime to
provide actual evidence of their behavior. However, these techniques are harder to scale as they need
to be run on an instrumented device to collect runtime data. Similarly, there is a need to stimulate
the app, simulating real inputs to examine as many code-paths as possible. While there are some
automatic techniques to generate synthetic inputs, they have been shown to be insufficient.
In this thesis, we explore the benefits of combining static and dynamic analysis techniques to
complement each other and reduce their limitations. While most previous work has often relied on
using these techniques in isolation, we combine their strengths in different and novel ways that allow
us to further study different privacy issues on the Android ecosystem. Namely, we demonstrate the
potential of combining these complementary methods to study three inter-related issues:
• A regulatory analysis of parental control apps. We use a novel methodology that relies on
easy-to-scale static analysis techniques to pin-point potential privacy issues and violations of
current legislation by Android apps and their embedded SDKs. We rely on the results from our
static analysis to inform the way in which we manually exercise the apps, maximizing our ability
to obtain real evidence of these misbehaviors. We study 46 publicly available apps and find
instances of data collection and sharing without consent and insecure network transmissions
containing personal data. We also see that these apps fail to properly disclose these practices
in their privacy policy.
• A security analysis of the unauthorized access to permission-protected data without user consent.
We use a novel technique that combines the strengths of static and dynamic analysis, by
first comparing the data sent by applications at runtime with the permissions granted to each
app in order to find instances of potential unauthorized access to permission protected data.
Once we have discovered the apps that are accessing personal data without permission, we
statically analyze their code in order to discover covert- and side-channels used by apps and SDKs to circumvent the permission system. This methodology allows us to discover apps using
the MAC address as a surrogate for location data, two SDKs using the external storage as a
covert-channel to share unique identifiers and an app using picture metadata to gain unauthorized
access to location data.
• A novel SDK detection methodology that relies on obtaining signals observed both in the app’s
code and static resources and during its runtime behavior. Then, we rely on a tree structure
together with a confidence based system to accurately detect SDK presence without the need
of any a priory knowledge and with the ability to discern whether a given SDK is part of legacy
or dead code. We prove that this novel methodology can discover third-party SDKs with more
accuracy than state-of-the-art tools both on a set of purpose-built ground-truth apps and on a
dataset of 5k publicly available apps.
With these three case studies, we are able to highlight the benefits of combining static and dynamic
analysis techniques for the study of the privacy and security guarantees and risks of Android
apps and third-party SDKs. The use of these techniques in isolation would not have allowed us to
deeply investigate these privacy issues, as we would lack the ability to provide real evidence of potential
breaches of legislation, to pin-point the specific way in which apps are leveraging cover and side
channels to break Android’s permission system or we would be unable to adapt to an ever-changing
ecosystem of Android third-party companies.The works presented in this thesis were partially funded within the framework of the following projects
and grants:
• European Union’s Horizon 2020 Innovation Action program (Grant Agreement No. 786741,
SMOOTH Project and Grant Agreement No. 101021377, TRUST AWARE Project).
• Spanish Government ODIO NºPID2019-111429RB-C21/PID2019-111429RBC22.
• The Spanish Data Protection Agency (AEPD)
• AppCensus Inc.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Srdjan Matic.- Secretario: Guillermo Suárez-Tangil.- Vocal: Ben Stoc
Study on privacy of parental control mobile applications
Parental control applications are one kind of mobile software programs, which are
used by parents to monitor and control the use that their kids make of their cellphone.
Parents install these type of apps on their children's phones in order to
remotely set rules for what the children can do with their device and to monitor
where the phone is and what their kids are using it for.
These type of applications are highly intrusive, because they gather all kinds of
data that re
ect private information about the users, such as their Internet history,
text messages, calls, location. . . Furthermore, these data are often sent to servers
hosted in the Internet, where the information is gathered in order to let the parent
access it later from a different device than their child's phone.
Therefore, these systems make it possible (willingly or not) for third parties
to access all or parts of these private data. From a privacy stand point, these
applications pose a great threat to users. However, there has not yet been a study
on the amount and kind of information that these apps gather and the security of
their communications.
In this thesis, we study how parental control apps behave, and the features that
they provide the user with, and we conduct several studies to understand their
privacy implications. First, we research how well these applications explain their
behavior and operation to their users. Second we gather information about the
permissions that these apps request and compare them to their behavior. Third
we study what information is gathered and later sent to Internet servers by these
programs. Finally, we investigate if this information is sent securely.
We studied fourteen different parental control applications, anyhow we only could
perform all of the studies explained above in seven of them. In each one of those
seven apps we find at least one of these privacy issues: the sending of sensitive
information to third parties, the leakage of private data from before the parental
control app installation, the sending of private information before the user agrees
with the term of usage (sometimes even the fact that the user never agrees to it),
or the sending of private data via an insecure communication channel. We also find
that 50% of the fourteen studied applications did not clearly explain to the user
that their children's data was being sent through the Internet and stored in servers.
Finally, we categorize 15% of the requested permissions in eleven of the fourteen
applications as confusing and probably unnecessary.---ABSTRACT---Las aplicaciones de control parental son un tipo de programas software para teléfonos
móviles, usados por los padres para motorizar y controlar el uso que hacen sus hijos
de sus teléfonos. Los padres instalan este tipo de apps en los teléfonos de los hijos,
pudiendo así establecer reglas para establecer el uso que pueden hacer los hijos de
sus teléfonos y monitorizar de manera remota la localización del teléfono y qué uso
están haciendo los niños su teléfono.
Este tipo de aplicaciones son altamente intrusivas, ya que recogen una gran cantidad
de datos que re
ejan información privada sobre los usuarios, como su historial
web, mensajes de texto, llamadas, localización. . . Además, estos datos suelen ser enviados
a servidores localizados en Internet donde la información se guarda para que
los padres puedan acceder después a esta información de manera remota, desde un
dispositivo distinto al teléfono de su hijo.
Por lo tanto, estos sistemas comprometen (de manera intencionada o no) esta
información, de modo que otros agentes podrían acceder a todos o parte de estos
datos privados. Desde el punto de vista de la privacidad, estas aplicaciones representan
una gran amenaza para los usuarios. Sin embargo, todavía no ha habido
ningún estudio sobre la información que dichas aplicaciones recogen y la seguridad
de las mismas.
En esta tesis estudiamos como se comportan este tipo de programas y las funcionalidades
que le presentan al usuario. También realizamos diferentes estudios
para conocer las implicaciones de privacidad de las aplicaciones de control parental.
Primero, analizamos cómo de claro explican estas aplicaciones su modelo de funcionamiento,
segundo recogemos información sobre los permisos que solicitan las
aplicaciones que analizamos, tercero estudiamos qué información es recolectada y
enviada a servidores de Internet por parte de estos programas y cuarto, investigamos
si esta información es enviada de una forma segura.
Estudiamos catorce aplicaciones de control parental, en el caso de siete de estas
realizando cada uno de los estudios arriba explicados. En cada una de estas siete
aplicaciones descubrimos al menos uno de los siguientes problemas de privacidad: el
envío de información especialmente sensible a terceros, el filtrado de datos privados
anteriores a la instalación de la aplicación de control parental, el envío de información
privada antes de que el usuario acepte los términos de uso (incluso el hecho de que
el usuario nunca llegue a aceptar este acuerdo) y el envío de datos confidenciales a
través de canales de comunicación inseguros. También encontramos que el 50% de
las catorce aplicaciones analizadas no explican de una manera clara para el usuario
que los datos de su hijo son enviados a través de Internet y guardados en servidores.
Por último, categorizamos el 15% de los permisos solicitados por once de las catorce
aplicaciones del estudio como confusos y probablemente innecesarios